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A  COMPARISON  OF  SOME  RELIABLE  TEST  DATA 
GENERATION  PROCEDURES'*^ 

Richard  A.  DeMillo*,  Daniel  E.  Hocking**,  Michael  J.  Merritt* 


Ab^^racjt 

A  set  of  mutants  of  a  program  P.  M(P),  is  a  finite  subset  of  the 
set  of  all  programs  written  in  the  language  of  P.  and  EM(P)  is 
the  set  of  programs  in  M(P)  which  are  (functionally)  equivalent 
to  P.  For  a  set  of  test  data  T,  DM(P,T)  is  the  set  of  programs 
in  M(P)  which  give  results  differing  from  P  on  at  least  one  point 
in  T.'\  A  mu_ta^^on  fo*'  P»T  is  defined  as  follows: 

IdM(P.T) I 

ms (P, T)  =  - 

|m(p) I-Iem(p) I 

As  described  elsewhere,  it  is  possible  to  choose  the  function  M 
so  that  ms(P,T)  =  1  only  if  T  demonstrates  the  correctness  of  P 
with  high  probability. 

This  paper  is  a  case  study  of  four  test  data  generation 

schemes.  For  a  fixed  program  P,  five  sets  of  test  data  are 

generated  and  mutation  scores  are  calculated  using  the  FMS.2 

mutation  system.  Since  each  set  has  a  score  less  than  one,  the 

FMS.2  system  is  used  to  derive  a  set  T  such  that  ms(P,T)=l.p_ 
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There  are  currently  many  suggested  procedures  for  choosing  input 
data  for  software  testing  [1,7,8,10,12,13].  In  this  paper  we 
describe  an  experimental  technique  for  the  relative  evaluation  of 
different  test  data  generation  procedures,  and  present  the 
results  of  one  such  study  comparing  five  testing  methodologies. 
Mutation  analysis  [1,4,51,  a  tool  for  the  evaluation  of 
individual  test  data  sets,  is  used  to  generate  a  mutation  score . 
0  i  ms(P,Tj^)  i  1,  where  P  is  a  program  and  T the  test  data 
generated  by  procedure  i.  Within  certain  constraints  discussed 
below,  data  sets  with  high  mutation  scores  may  be  judged  superior 
to  those  with  low  scores.  Mutation  scores  for  data  sets 
generated  by  the  various  methodologies  provide  an  objective 
evaluation  of  those  methodologies,  when  applied  to  the  particular 
program  studied.  Repeating  this  procedure  with  a  variety  of 
programs  would  provide  a  tool  for  the  overall  evaluation  of  test¬ 


ing  methodologies. 
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SECTION  2 

KBLI ABILITY  OF  TEST  DATA 

When  a  program  P  hehavee  correctly  on  a  tingle  test  input, 
t,  it  is  differentiated  from  an  infinite  subset  of  Prog(P)  (all 
programs  in  the  language  of  P) ,  the  subset  of  programs  that 
behave  incorrectly  on  input  t.  Since  an  infinite  number  of 
programs  in  Prog(P)  differ  from  P  on  only  one  input,  testing 
alone  cannot  establish  program  correctness  unless  it  is 
exhaustive.  This  is  of  course  impossible  for  most  practical 
situations . 

Thus,  testing  cannot  be  used  to  establish  program 
correctness--  but  it  can  be  used  to  increase  conf idence  in  a 
program's  correctness.  Two  sets  of  test  data  often  differ  In  the 
levels  of  confidence  they  engender — one  is  said  to  be  more 
reliable  than  the  other.  An  example  would  be  two  sets  of  data,  T 
and  T',  such  that  T  c:  T':  T'  is  more  reliable  than  T. 

Meaaurlag  Beliabillty 

Let  N(P)  c  Prog(P)  be  a  finite  subset  of  the  programs  in 
the  language  of  P  and  let  EM(P)  c  M(P)  be  the  subset  of  M(P)  of 
programs  equivalent  to  P  (i  e.,  programs  that  compute  the  same 
function).  Finally,  let  DM{P,T)  eM(P)  -  EM(P)  be  the  subset  of 
non-equivalent  programs  in  M(P)  that  behave  differently  than  P  on 
some  input  from  the  set  T  of  test  data.  We  define  the  mutation 
score  for  program  P  and  test  data  set  T  to  be  the  fraction  of 
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non-equivalent  programs  differentiated  from  P  by  T: 

I DM(P,T) I 

ms(P.T)  =  - 

|M(P)|  -  |EM(P)1 

Notice  that  ms()  is  determined  by  the  language  and  by  the  set 
M(P).  Thus,  the  mutation  score  is  a  practical  method  for  compar¬ 
ing  the  reliability  of  test  data  sets  provided  only  the  set  M(P) 
is  chosen  to  meet  two  criteria: 

I)  ms(P,T)  is  easy  to  compute,  and 
II)  confidence  in  the  correctness  of  P  increases 
as  ms(P,T)  approaches  1. 

Furthermore,  if  P  is  known  to  be  correct,  the  mutation  score  may 
be  used  to  compare  the  reliability  of  test  data  selection 
methods:  one  method  is  more  reliable  than  another  if  it  produces 
more  reliable  test  data  sets. 

Mutation  Theory 


Mutation  analysis  is  one  method  of  choosing  M(P)  to  satisfy 
the  I  and  II  above.  In  mutation  analysis,  each  element  of  M(P) 
is  generated  from  P  by  introducing  some  small  change  into  P — the 
set  M(P)  is  the  set  of  mutants,  of  P.  Each  change  is  meant  to 
simulate  a  simple  programmer  error  [S].  These  changes  are 
introduced  according  to  rules  called  mut,ant  o^^rajtors.,  different 
types  of  errors  being  introduced  according  to  different  operators 
(a  complete  discussion  of  mutant  operators  appears  in  [1]-- 
explicit  examples  of  program  mutants  are  presented  later  in  this 
paper  )  . 
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The  errors  introduced  by  mutant  operators  simulate  a  c  t  u  a.  1. 
program  errors  made  by  competent  programmers  in  practice  [1]. 
Provided  P  is  correct,  the  elements  of  M(P)  are  the  programs  with 
single  errors  that  a  competent  programmer  is  most  likely  to 
produce  in  place  of  P.  The  more  frequently  such  likely  errors 
are  detected  by  a  set  of  test  data  (the  higher  the  mutation 
score),  the  more  confidence  one  can  acquire  that  the  program  has 
no  such  single  error.  Empirical  evidence  supports  the  assumption 
that  test  data  sufficient  to  detect  single  errors  suffices  to 
detect  erroneous  programs  with  multiple  errors  as  well  [1,2,3]. 
Additional  evidence  that  this  choice  of  M(P)  satisfies  II  is  the 
observed  reliability  of  programs  tested  by  data  sets  with  high 
mutation  scores  [1,4,6]. 

Prototype  automated  mutation  systems .  described  below,  have 
been  used  lo  compute  mutation  scores  for  a  large  number  of 
programs,  in  three  languages  [1].  Theoretical  studies  and  run¬ 
time  observations  suggest  that  mutation  scores  may  be 
economically  computed  for  even  large  programs  [6,9],  so  that  this 
choice  of  M(P)  also  satisfies  I. 

A  m^a_t_ion  lYs_tem  generates  the  set  M(P)  by  applying  mutant 
operators  to  P.  Dsing  appropriate  optimizing  heuristics,  it  then 
interprets  the  program  and  its  mutants,  running  them  on  test  data 
provided  by  the  tester.  Any  mutant  which  produces  output  differ¬ 
ing  from  the  original  program  is  'killed,  ’  and  removed  from 
further  consideration.  Some  mutant  programs  will  perform 
identically  to  the  original  program  on  all  input s -- th e s e  are 

fhe  set  EM(P).  As  the  testing  process 
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continues,  the  tester  may  view  individual  mutants  or  apply 
automated  heuristics  in  order  to  determine  if  they  are 
equivalent.  A  program  passes  the  mutation  analysis  once  all  non¬ 
equivalent  mutants  have  been  killed  by  some  input.  Of  course. 


the  original 

program  must 

have 

been 

judged 

by  the  tester  to 

have 

performed  correc 

t  ly  on 

all 

test 

data . 

If  P  is  known 

to  be 

correct,  the 

set 

of  mutants  killed 

by  a  set 

of  test  data 

T  is 
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DM(P,T) 
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SECTION  3 
CASE  STUDY 


The  remainder  of  this  paper  presents  a  case  study  in  which 
five  test  data  generation  techniques  are  employed  to  generate 
five  sets  of  test  data  for  a  simple  program,  TRITYP,  which  has 
been  studied  elsewhere  [1,5.9].  Mutation  scores  are  assigned  to 
each  set  of  data  using  the  interactive  FMS.2  mutation  system  [1]. 
As  all  the  scores  are  less  than  1.  the  mutation  system  is  used 
interactively  to  derive  a  set  T  such  that  ms(P,T)  =  1. 

Test  Data  Generation  Methods 


The  five  test  data  generation  techniques  we  study  involve 
different  analyses  of  the  program  to  be  tested.  They  are: 
Specifications,  Statement,  Branch  and  Domain  Analysis  (two 
methods  studied  are  variations  of  domain  analysis). 

Specifications  analysis  is  a  'black  box'  approach  to 
program  testing:  it  involves  no  analysis  of  the  actual  program. 
Instead,  an  ad  h£^  and  intuitive  analysis  of  the  program's 
specifications  is  performed.  The  tester  uses  the  specifications 
to  try  to  outguess  the  programmer  and  expose  errors.  Because  of 
the  adversary  nature  of  this  technique,  it  has  been  recommended 
that  programmers  not  test  their  own  programs,  and  even  that 
programs  be  tested  by  entirely  different  organizations  [10]. 
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The  next  two  aethods  are  'white  box'  testing  techniques, 
involving  explicit  and  often  complex  analysis  of  the  program 
code.  Of  these,  statement  analysis  is  the  simplest,  requiring 
only  that  a  test  data  set  cause  every  program  statement  to  be 
executed  by  at  least  one  test  input.  Automated  systems  exist 
that  backtrack  from  a  statement,  analyzing  branching  predicates 
to  produce  a  single  predicate  which,  when  satisfied  by  input 
values,  causes  the  appropriate  statement  to  be  executed  [11]. 


Branch  analysis  places  a  stronger  restriction  on  test  data 
sets,  requiring  not  just  that  every  statement  be  executed,  but 
that  every  branch  be  executed  at  least  once  [1,7,8].  Thus,  every 
branching  predicate  must  evaluate  to  TRUE  and  to  FALSE  for  some 
different  inputs  in  the  test  data  set. 


Domain  analysis,  the  final  test  data  generation  strategy  we 
examine,  may  be  used  as  either  a  black  box  or  white  box  technique 
[1,12],  In  the  black  box  approach,  the  program  specifications 
are  used  to  partition  the  input  space  into  contiguous  convex 
regions,  called  domains,  on  which  the  program  is  to  compute 
different  functions.  Test  data  are  picked  from  each  domain,  each 
boundary  between  domains,  and  points  close  to  such  boundaries. 
The  white  box  approach  performs  a  similar  analysis,  but  examines 
the  domains  implicit  in  the  program  structure,  rather  than  those 
which  ought  to  exist,  given  the  program  specifications.  For 
large  numbers  of  domains  and  higher  dimensions,  the  number  of 
test  cases  required  by  the  black  box  technique  becomes 
unreasonably  large.  One  heuristic  for  decreasing  the  number  of 
test  cases  is  to  pick  them  so  as  to  satisfy  several  domain 
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requirements  at  the  same  time;  a  single  point  may  lie  within  a 
domain  and  approach  two  domain  boundaries,  thus  replacing  three 
separate  test  cases.  For  this  study,  black,  box  domain  analysis 
was  carried  out  twice,  once  without  this  heuristic  and  once  with 
it.  We  differentiate  these  slightly  different  techniques  by  cal¬ 
ling  them  Domain  Analysis  and  Minimized  Domain  Analysis,  respec¬ 
tively. 

Domain  analysis  was  applied  to  a  program  with  three  input 
variables  for  this  study  (published  examples  usually  analyse 
programs  with  two  inputs).  The  analysis  involves  the  partition¬ 
ing  of  the  first  orthant  of  lattice  three-space,  plus  the  origin, 
into  one,  two  and  three-dimensional  subsets.  Figure  1  is  a 
representation  of  this  partitioning.  We  found  this  partition 
fairly  difficult  to  co n s t r uc t -- ap p ly  i  ng  this  technique  to 
programs  with  more  than  three  inputs  would  require  partitioning 
higher-dimensional  spaces,  while  programs  with  inputs  of 
different  types  would  require  the  partitioning  of  heterogeneous 
input  spaces . 

The  Program  TKITTP 


The  simple  FORTRAN  program  TRITYP 
three  nonnegative  integers  as  input, 
lengths  of  the  sides  of  a  triangle. 
{1,2, 3, 4)  is  output,  denoting  that 

equilateral,  isosceles,  scalene  or 

Triangles  with  sides  of  zero  length 


in  Figure 
representing 
An  element 
the  input 
i 1  legal , 
are  legal 


2  requires 
the  relative 
of  the  set 
triangle  is 
respectively  . 
,  but  other 
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degenerate  triangles  are  not  (e.g.  336). 

The  behavior  of  TRITYP  on  negative  inputs  is  not 
consistent,  so  that  its  acceptance  of  negative  inputs  at  all  may 
be  seen  as  a  specifications  error.  For  the  purposes  of  this 
study,  therefore,  we  will  analyze  the  behavior  of  the  program  on 
nonnegative  inputs  only.  On  all  such  inputs  within  the  integer 
range  of  the  host  machine,  the  behavior  of  TRITYP  is  correct. 
The  program  TRITYP  has  been  studied  elsewhere  [1,5.9],  and  a  very 
similar  program  was  discussed  in  [10]. 

Mutation  Scores 


Five  sets  of  test  data  are  generated  for  TRITYP,  one  aLCi 
ding  to  each  of  the  methods  discussed  above.  Mutation  scores 
then  computed  using  the  FMS . 2  system;  a  summary  of  the  results 
appears  in  Figure  3,  listed  in  order  of  increasing  mutation 
score. 

The  various  mutant  operators  available  on  the  FMS. 2  mut a- 

,  /■ 

tion  system  are  discussed  in  some  detail  elsewhere  [1].  For  this 
study.  all  of  them  are  applied,  producing  1035  mutants  of  the 
program  TRITYP.  Of  these,  69  ate  equivalent  mutants.  Thus 
IM(TRITYP)1  =  1035,  |EM(TRITYP)|  =  69;  that  is,  there  are  966 
non-equivalent  mutants  of  TRITYP.  The  number  of  these  non¬ 
equivalent  mutants  killed  by  the  various  test  data  sets  are  used 


to  determine  the  respective  mutation  scores. 
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It  is  apparant  from  the  results  in  Figure  3  that  size  alone 

is  not  the  determining  factor  in  our  measure  of  data  set 

reliability.  While  the  largest  set,  Tq*  1®  most  reliable. 

the  set  Tj^j^  rated  almost  as  high  with  less  than  half  the  size 

(This  is  also  evidence  that  the  minimizing  heuristic  is 

reasonable).  Similarly,  Tg  measured  significantly  better  than 

T„  ,  with  30%  fewer  test  cases.  This  observation  contradicts  the 
^Sp 

view  that  "the  more  test  cases,  the  better,"  and  demonstrates 
that  a  few,  well  chosen  test  cases  may  be  more  reliable  as  well 
as  more  economical  than  a  larger  set  of  less  carefully  chosen 
data. 


SnrTiwiag  Mutants 


None  of  the  sets  of  test  data  studied  killed  all  the  non¬ 
equivalent  mutants  of  TRITYP.  Since  TRITYP  is  known  to  be 
correct,  each  of  these  s_ury_iyin^  is  *  possible  erroneous 
program  that  would  not  have  been  detected  by  the  test  set  it  sur¬ 
vived.  These  surviving  mutants  are  thus  specific  examples  of 
inadequacies  in  the  various  testing  methodologies — by  studying 
them  in  some  detail,  we  may  hope  to  discover  in  more  general 
terms  the  strengths  and  weaknesses  of  these  methodologies.  The 
remainder  of  this  section  provides  a  brief  discussion  of  the  five 
methodologies  studied  above,  in  light  of  their  surviving  mutants. 
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Stateaest  Analjsis 


Figure  4  provides  examples  of  three  mutants  that  survive 
the  data  set  Tc<.'  output  by  the  FMS.2  system.  The  second  mutant 
shown,  in  which  GOTO  60  was  replaced  by  CALL  TRAP,  was  not  detec¬ 
ted  because  that  line  of  code  was  not  executed  by  any  input  in 

Tjjj.  The  two  lines  of  code 
IF( <I+J) .LE.K)GOTO  50 
GOTO  60 

were  treated  as  one  statement  during  the  generation  of  the  set 
Xgj.  This  statement  is  executed  by  the  input  (3  3  8),  but  only 
one  branch  of  the  predicate,  to  GOTO  50,  is  executed  by  that 
input.  It  is  this  type  of  error  which  branch  analysis  attempts 
to  detect,  by  requiring  that  eyj^j  branch  be  executed  by  some 
input.  The  other  two  mutants  shown  wyrje  executed,  but  behaved 
identically  to  TRITYP  on  those  inputs.  Thus,  it  may  not  be 
enough  to  merely  execute  a  statement  or  branch  on  only  one  input. 

Specifications  Analysis 


Three  mutants  surviving  both  and  Tgp  are  shown  in 
Figure  5.  Once  again,  the  appropriate  program  branches  are  not 
executed  by  test  data,  and  these  errors  would  be  undetected.  As 
an  ex  am  pie,  the  first  mutant,  replacing  IF(I-*-K.LF.  .J)GOTO  50  with 
IF ( J +K . LE . J ) GOTO  50,  will  on’y  be  detected  by  input  with  I  equal 
to  R,  and  I  +  K-iJ.  When  a  tester  attempts  to  exercise  paths  by 
altering  various  input  parameters,  but  without  explicit  knowledge 
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of  the  code  or  without  tracing  the  logic  of  the  code,  such  errors 
may  easily  remain  undetected. 

Branch  Analysis 


Every  branch  in  the  program  TRITYP  is  executed  by  one  of 
the  nine  inputs  in  Tg >  and  this  set  succeeds  in  killing  the 
mutants  mentioned  in  the  previous  sections,  despite  being  a  smal¬ 
ler  set  than  "^Sp  ‘  simple  analysis  of  TRITYP  required  to 
produce  Tg  has  a  payoff  in  high  reliability  with  a  small  number 
of  test  cases.  Examples  of  mutants  that  do  survive  appear  in 
Figure  6.  In  the  first  one, 

IF(I  +  J.LE.K.OR.J  +  K.LE.I.OR.I  +  K.LF..J)GOTO  50  4 
IF( J+J . LE.K.OR. J+K. LE. I.OR.I+K.LE. J)G0T0  50 . 

This  error  is  not  detected  because  of  the  complexity  of  the 
branching  predicate  —  only  a  few  of  the  subexpressions  are  exer¬ 
cised  by  the  test  data.  This  is  an  example  of  an  error  in 
processing  a  particular  domain,  as  this  predicate  defines  the 
region  of  input  space  of  illegal  but  distinct  integer  triples. 

Domain  Analyses 


There  were  very  few  mutants  that  survived  the  sets  Tjjg  and 
Tp,  and  in  fact  those  surviving  Tp  are  a  subset  of  those  surviv- 

^MD  •  survivors  are  all  shown  in  Figure  7.  Many  of  these 
involve  the  ZPUSH  operator,  which  changes  its  argument  only  when 
it  is  zero.  It  then  evaluates  to  the  largest  permitted  integer. 
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This  operator  is  intended  to  explore  the  behavior  of  the  program 
when  variables  have  the  valoe  zero,  a  frequently  important 
special  case.  The  last  few  mutants  in  Figure  7  are  of  less 
debatable  significance.  These  are  examples  of  simple  errors  in 
which  program  constants  replace  variables,  e.g.: 

40  IF(  J  +  K.LE.  DGOTO  50 
40  IF(  J  +  3  .LE.  DGOTO  50. 

As  Figure  8  shows,  each  of  these  mutants  computes  incorrect 
values  for  portions  of  two  domains.  Unless  test  data  is  chosen 
from  one  or  more  of  these  regions,  the  errors  go  undetected.  It 
is  an  accident  that  some  of  these  mutants  were  killed  by  each  of 

^St'  ^Sp’  them  by  Tp.  None  of  the  five  test  data 
generation  schemes  studied  checks  specifically  for  this  type  of 
code-dependent  error. 

Intuitively,  domain  analysis  is  a  stronger  technique  than 
statement  or  branch  analysis,  and  our  study  quantifies  this 
qualitative  appraisal. 

Test  Data  Generation  Using  the  Nutation  System 

The  mutation  operators  of  the  FMS.2  mutation  system  have 
been  specifically  designed  to  detect  statement,  path  and  domain 
errors,  among  others.  During  interactive  use,  an  operator  may 
examine  mutants  not  killed  by  the  current  test  data,  and  generate 
new  input  to  kill  those  particular  mutants.  Starting  with  the 
specifications  analysis  test  data  set  '^gp,  this  technique  is 
employed  to  generate  36  test  cases  (data  set  T^g^*  which  kill  all 


Section  3 


CASE  STUDY 


Page  14 


non-equivalent  mutants  of  the  program.  Thus,  ms  ( TRITYP,  Tjkj^  “ 
Examination  of  Tjj^  reveals  test  cases  similar  to  those  generated 
by  domain  analysis.  In  fact,  many  of  these  test  cases  are  alter¬ 
nate  choices  for  domain  analysis,  in  that  they  explore  the  same 
domains  and  domain  boundaries.  As  an  example,  the  input  (2  1  0) 
of  mutation  analysis  explores  the  same  domain  boundary  (see 
Figure  1)  as  (71  40  30),  an  input  from  domain  analysis;  the  boun¬ 
dary  region  described  by  the  equation  J  +  K  +  1  =  I. 

Conclas ion 


This  paper  presents  a  technique  for  objectively  evaluating 
the  reliability  of  test  data  generation  methods,  relative  to  a 
particular  program.  It  is  possible  that  for  radically  different 
programs,  different  results  could  be  obtained,  although  our 
previous  studies  have  not  shown  any  particular  sensitivity  to 
program  choice.  For  the  single  program  studied,  three  of  the 
generation  techniques  ranked  in  order  of  the  complexity  of 
program  analysis  that  each  requires  (statement,  branch  and  domain 
analysis).  It  is  an  interesting  point  that  the  fourth  technique, 
specifications  analysis,  was  less  reliable  than  the  relatively 
simple  branch  analysis  —  specifications  analysis  is  so  difficult 
to  apply  effectively  as  to  be  judged  an  art  by  its  proponents 
[101.  The  slight  difference  in  scores  for  domain  and  minimized 
domain  analysis  suggest  that  the  small  loss  in  reliability  of  the 
latter  technique  may  be  effectively  sacrificed  in  return  for  a 
smaller  set  of  test  data,  an  important  consideration  when  tec.  t 
runs  are  expensive.  The  objective  reliability  measure  presented 
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here  can  be  combined  with  economic  and  efficiency  considerations, 
to  permit  a  data  processing  manager  to  make  an  effective, 
informed  choice  between  testing  methodologies. 
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Figure  1  .  a 


Domain  analysis:  side  view  of  input  space. 

Triangular,  pyramidal  region  contains  inputs 
describing  legal  triangles. 


Figure  l.b 

Pyraniuel  region  of  Figure  l.a,  in  cross-section 
perpendicular  to  tiic  I  =  J=i"  ray. 
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FIGURE  2 


SUBROUTINE  TRITYP ( I . J , K , CODE ) 

C...I.J,  AND  K  ARE  SIDES  OF  THE  PROPOSED  TRIANGLE 
C...CODE  RETURNS  THE  TYPE  OF  THE  TRIANGLE 
C...CODE  =  1  FOR  EQUILATERAL 
C...CODE  =  2  FOR  ISOSCELES 
C...CODE  =  3  FOR  SCALENE 

C...CODE  =  4  FOR  AN  IMPOSSIBLE  TRAINGLE 
C 

INTEGER  I.J.K.CODE 
INPUT  I.J.K 
RDONLY  I.J.K 
OUTPUT  CODE 
INTEGER  MATCH 

...COUNT  MATCHING  SIDES 
MATCH  =  0 

IFd.EQ.  DMATCH  =  MATCH  +  100 
IF(I.EQ.K)MATCH  =  MATCH  +  200 
IF( J.EQ.K)MATCH  =  MATCH  +  300 

...SELECT  POSSIBLE  SCALENE  TRIANGLES 
IF(MATCH.EQ.O)GOTO  10 

...SELECT  POSSIBLE  ISOSCELES  TRIANGLES 
IF(MATCH.EQ. 100)GOTO  20 
IF(MATCH.EQ.200)GOTO  30 
IF(MATCU.EQ.300)GOTO  40 

...TRIANGLE  MUST  BE  EQUILATERAL 
CODE  =  1 
RETURN 

. . . POSSIBLE  SCALENE 

0  IF( (I+J) .LE.K.OR.( J+K) .LE.I.OR.(I+K) .LE. J)GOTO  50 
CODE  =  3 
RETURN 

0  IF( ( I+J) . LE. K) GOTO  50 

GOTO  60 

0  IF( (I+K) ,LE. J)GOTO  50 

GOTO  60 

0  IF(  (  J  +  K)  .LE.  DGOTO  50 
GOTO  60 

...NO  TRIANGLE  POSSIBLE 
0  CODE  =  4 

RETURN 

. . . ISOSCELES 
60  CODE  =  2 

RETURN 
END 


FIGDRE  3 


SUMMARY 

OF  RESULTS 

Test  Data 

Size 

of  Test 

Number  of 

Mut  a  t i 0  n 

Generation 

Data 

Set: 

Mutant s  Killed: 

Score : 

Te  chn i que 

1t1 

1dm<trityp,t) 1 

ms(TRITYP.T) 

Statement 

Ana  lysis 

5 

660 

.68 

Spec  ifications 

Ana  lysis 

13 

7  92 

.82 

Branch 

Ana  lysis 

9 

821 

.85 

Minimized 

Domain  Analysis 

36 

943 

.976 

Doma  i  n 

Ana  lysis 

75 

951 

.984 

] 


FIGURE  4 


MUTANT 


MUTANT 


MUTANT 

20 


SELECTED  MUTANTS  SURVIVING  DATA  SET  T 


NUMBER  40 

IF(I  .EQ.  J)  MATCH  =  MATCH  +  100 
BECOMES 

IF(I  .EQ.  J)  CODE  =  MATCH  +  100 

NUMBER  930 
GOTO  60 
BECOMES 
CALL  TRAP 

NUMBER  425 

IF(I  +  J  .LE.  K)  GOTO  50 
BECOMES 


20 


IF(I  +  1  .LE.  K)  GOTO  50 


FIGURE  5 


SELECTED  MUTANTS  SURVIVING  DATA  SETS  T^^  AND  Tgp 


MUTANT  NUMBER  150 

30  IF(I  +  K  .LE.  J)  GOTO  50 

BECOMES 

30  IF(J  +  K  ,LE.  J)  GOTO  50 

MUTANT  NUMBER  899 

30  IF(I  +  K  .LE.  J)  GOTO  50 

BECOMES 

30  IF(I  +  K  .LE.  -ABS  J)  GOTO  50 

MUTANT  NUMBER  1030 
40  IF(J  +  K  .LE.  I)  GOTO  50 
BECOMES 

40  IF(J  +  K  .LE.  I)  GOTO  60 
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SELECTED  MUTANTS  THAT  SURVIVE  Tg 


MUTANT  NUMBER  98 


10 
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.LE. 

K  .OR.  J 
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J) 

GOTO 

50 

BECOMES 

10 

IF(J  +  J 

.  LE. 

K  .OR.  J 
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K  .LE. 

I 

.OR.  I 

+ 

K 

.LE. 

J) 

GOTO 

50 

MUTANT 

NUMBER 

375 

10 

IF(I  +  J 

.  LE. 

K  .OR.  J 

+ 

K  .LE. 

I 

.OR.  I 

+ 

E 

.LE. 

J) 

GOTO 

50 

BECOMES 

10 

IF(I  +  J 

.  LE. 

K  .OR.  2 

•f 

K  .LE. 

I 

.OR.  I 

+ 

E 

.LE. 

J) 

GOTO 

50 

MUTANT  NUMBER  798 

IF(I  .EQ.  J)  MATCH  =  MATCH  +  100 
BECOMES 


IF(ZPUSH  I  .EQ.  J)  MATCH  =  MATCH  +  100 
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MUTANTS  SURVIVING  BOTH  Tjip  AND  Tq 


MUTANT  NUMBER  843 
10  IF(I+J.LE. 

BECOMES 

10  IF(ZPUSH  I  +  J 

•  GOTO  50 

MUTANT  NUMBER  846 
10  IF(I  +  J  ,LE.  K 
BECOMES 

10  IF(I  +  ZPUSH  J 

•  GOTO  50 

MUTANT  NUMBER  852 
10  IF(I  +  J  .LE. 
BECOMES 

10  IF(I  +  J  .LE. 

•  GOTO  50 

MUTANT  NUMBER  855 
10  IF(I  +  J  .LE. 
BECOMES 

10  IF(I  +  J  .LE. 


K  .OR.  J  +  K  .LE.  I  .OR 


.LE.  K  .OR.  J  +  K  .LE. 


.OR.  J  +  K  .LE.  I  .OR 


.  LE.  K  .OR.  J  +  K  .LE.  I 


K  .OR.  J  +  K  .LE.  I  .OR 


ZPUSH  K  .OR.  J  +  K  .LE. 


K  .OR.  J  +  K  .LE.  I  .OR 


K  .OR.  ZPUSH  J  +  K  .LE.  I 


I  +  K  .LE.  J)  GOTO  50 


I  .OR.  I  +  K  .LE.  J) 


I  +  K  .LE.  J)  GOTO  50 


.OR.  I  +  K  .LE.  J) 


I  +  K  .LE.  J)  GOTO  50 


I  .OR.  I  +  K  .LE.  J) 


I  +  K  .LE.  J)  GOTO  50 


OR.  I  +  K  .LE.  J) 


•  GOTO  50 
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MUTANT  NUMBER  8S8 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  . LE .  I  .OR.  I  +  K  . LE .  J)  GOTO  50 
BECOMES 

10  IF(I  +  J  .LE.  K  .OR.  J  +  ZPUSH  K  . LE .  I  .OR.  I  +  K  . LE .  J) 

•  GOTO  50 

MUTANT  NUMBER  864 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  . LE .  I  .OR.  I  +  K  . LE .  J)  GOTO  50 
BECOMES 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  . LE .  ZPUSH  I  .OR.  I  +  K  . LE .  J) 

•  GOTO  50 

MUTANT  NUMBER  867 

10  1F(I  +  J  .LE.  K  .OR.  J  +  K  .LE.  I  .OR.  I  +  K  . LE .  J)  GOTO  50 
BECOMES 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  . LE .  I  .OR.  ZPUSH  I  +  K  . LE .  J) 

•  GOTO  50 

MUTANT  NUMBER  870 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  . LE .  I  .OR.  I  +  K  . LE .  J)  GOTO  50 
BECOMES 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  .LE.  I  .OR.  I  +  ZPUSH  K  . LE .  J) 


•  GOTO  50 
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MUTANT  NUMBER  876 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  .LE.  I  .OR.  I  +  K  .LE.  J)  GOTO  50 
BECOMES 

10  IF(I  +  J  .LE.  K  .OR.  J  +  K  .LE.  I  .OR.  I  +  K  . LE .  ZPUSH  J) 

•  GOTO  50 

MUTANT  NUMBER  879 

20  IF(I  +  J  .LE.  K)  GOTO  50 

BECOMES 

20  IF(ZPUSH  I  +  J  .LE.  K)  GOTO  50 

MUTANT  NUMBER  882 

20  IF(I  +  J  .LE.  K)  GOTO  50 

BECOMES 

20  IF(I  +  ZPUSH  J  .LE.  K)  GOTO  50 

MUTANT  NUMBER  885 

20  IF(I  +  J  .LE.  K)  GOTO  50 

BECOMES 

20  IF(ZPUSH  (I  +  J>  .LE.  K)  GOTO  50 

MUTANT  NUMBER  903 

40  IF(J  +  K  .LE.  I)  GOTO  50 

BECOMES 

40  IF(ZPUSH  J  +  K  .LE.  I)  GOTO  50 
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MUTANT  NUMBER  906 

40  IF(J  +  K  .LE.  I)  GOTO  50 

BECOMES 

40  IF(J  +  ZPUSH  K  .LE.  I)  GOTO  50 

MUTANT  NUMBER  909 

40  IF(J  +  K  ,LE.  I)  GOTO  50 

BECOMES 

40  IF(ZPUSH  (J  +  K)  ,LE.  I)  GOTO  50 

MUTANTS  SURVIVING  Tj^p,  KILLED  BY  Tp 

MUTANT  NUMBER  419 

20  1F(I  +  J  .LE.  K)  GOTO  50 

BECOMES 

20  IF(3  +  J  .LE.  K)  GOTO  50 

MUTANT  NUMBER  421 

20  IF(I  +  J  .LE.  K)  GOTO  SO 

BECOMES 

20  IF(2  +  J  .LE.  K)  GOTO  50 

MUTANT  NUMBER  426 

20  IF(I  +  J  .LE.  K)  GOTO  50 

BECOMES 

20  IF(I  +  3  .LE.  K)  GOTO  50 
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jwhich  are  (Tunct  ic.nally)  equivalent  to  P.  For  a  set  of  test  data  T,  nM(P,T)  is 
the  set  of  progra’i;;  in  M(P)  which  give  result.s  differing  from  P  on  at  least  one 
[point  in  T.  A  mui.ition  score  for  P,T  is  defined  as  follows: 


ms(P  T)  =  T — i 

^  TM(pTr-T^ 


EM(P) 
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^  i  ruiiari. 
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20.  As  described  elsewhere,  it  is  possible  to  choose  the  function  M  so  that 

ms(P,T)  =  1  only  if  T  demonstrates  the  correctness  of  P  with  high  probability. 


This  paper  is  a  case  study  of  four  test  data  generation  schemes.  For  a 
fixed  program  P,  five  sets  of  test  data  are  generated  and  mutation  scores  are 
calculated  using  the  FMS.2  mutation  system.  Sim  e  each  set  has  a  score  less 
than  one,  tlie  FMS,2  system  is  used  to  derive  a  set  T  such  that  ms(P,T)  =  1. 
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